TREC-9 CLIR at CUHK: Disambiguation by Similarity Values Between Adjacent Words
نویسندگان
چکیده
We investigated the dictionary-based query translation method combining the translation disambiguation process using statistic cooccurrence information trained from the provided corpus. We believe that neighboring words tend to be related in contextual meaning and have higher chance of co-occurrence particularly if adjacent words (two or more) compose a phrase. The correct translation equivalents of co-occurrence pattern in a source language are more likely to co-occur in a target language documents than in conjunction with any incorrect translation equivalents within a certain range of contextual window size. In this work, we tested several methods to calculate the degree of co-occurrence and used them as the basis of disambiguation. Different from most disambiguation methods which usually select one best translation equivalent for a word, we select the best translation equivalent pairs for two adjacent words. The final translated queries are the concatenation of all overlapped adjacent word translation pairs after disambiguation.
منابع مشابه
TREC-9 CLIR Experiments at MSRCN
In TREC-9, we participated in the English-Chinese Cross-Language Information Retrieval (CLIR) track. Our work involved two aspects: finding good methods for Chinese IR, and finding effective translation means between English and Chinese. On Chinese monolingual retrieval, we investigated the use of different entities as indexes, pseudorelevance feedback, and length normalization, and examined th...
متن کاملTwenty-One at TREC-8: using Language Technology for Information Retrieval
This paper describes the oÆcial runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processing techniques. The following new techniques are introduced in this paper. In the Ad-Hoc and CLIR t...
متن کاملPhrase Identification in Cross-Language Information Retrieval
Term-sense ambiguity and the difficulty in translating phrases are the main sources of problem in dictionarybased cross-language information retrieval (CLIR) approaches. We propose a term similarity-based translationphrase identification technique to enhance the retrieval effectiveness of a dictionary-based query translation method. The technique identifies noun-phrases in the target language b...
متن کاملCross-language information retrieval in Twenty-One: Using one, some or all possible translations?
This paper gives an overview of the tools and methods for Cross-Language Information Retrieval (CLIR) that were developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whe...
متن کاملTREC-10 Experiments at University of Maryland CLIR and Video
The University of Maryland Researchers participated in both the Arabic-English Cross Language Information Retrieval (CLIR) and Video tracks of TREC-10. In the CLIR track, our goal was to explore effective monolingual Arabic IR techniques and effective query translation from English to Arabic for cross language IR. For the monolingual part, the use of the different index terms including words, s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000